Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases

نویسندگان

  • Fabien Navarro
  • Adrien Saumard
چکیده

We investigate the optimality for model selection of the so-called slope heuristics, V -fold cross-validation and V -fold penalization in a heteroscedatic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities that prove the asymptotic optimality of the slope heuristics—when the optimal penalty shape is known—and V -fold penalization. Furthermore, V -fold cross-validation seems to be suboptimal for a fixed value of V since it recovers asymptotically the oracle learned from a sample size equal to 1− V −1 of the original amount of data. Our results are based on genuine concentration inequalities for the true and empirical excess risks that are of independent interest. We show in our experiments the good behavior of the slope heuristics for the selection of linear wavelet models. Furthermore, V -fold cross-validation and V -fold penalization have comparable efficiency. AMS 2000 subject classifications— 62G08, 62G09

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model selection by resampling penalization

We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local Rademacher complexities and V fold cross-validation. In the case example of least-square regression on histograms, we prove oracle inequalities, and that these ...

متن کامل

Slope Heuristics for Heteroscedastic Regression on a Random Design

In a recent paper [BM06], Birgé and Massart have introduced the notion of minimal penalty in the context of penalized least squares for Gaussian regression. They have shown that for several model selection problems, simply multiplying by 2 the minimal penalty leads to some (nearly) optimal penalty in the sense that it approximately minimizes the resulting oracle inequality. Interestingly, the m...

متن کامل

Suboptimality of penalties proportional to the dimension for model selection in heteroscedastic regression

We consider the problem of choosing between several models in least-squares regression with heteroscedastic data. We prove that any penalization procedure is suboptimal when the penalty is proportional to the dimension of the model, at least for some typical heteroscedastic model selection problems. In particular, Mallows’ Cp is suboptimal in this framework, as well as any “linear” penalty depe...

متن کامل

V-fold cross-validation improved: V-fold penalization

We study the efficiency of V -fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call “V -fold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it “overpenalizes” all the more that V is large. Hence, asymptotic opti...

متن کامل

Choosing a penalty for model selection in heteroscedastic regression

Penalization is a classical approach to model selection. In short, penalization chooses the model minimizing the sum of the empirical risk (how well the model fits data) and of some measure of complexity of the model (called penalty); see FPE [1], AIC [2], Mallows’ Cp or CL [22]. A huge amount of literature exists about penalties proportional to the dimension of the model in regression, showing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017